Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Metadata Support to Pluginspace #467

Merged
merged 50 commits into from
Jul 5, 2023

Conversation

jcjgraf
Copy link
Contributor

@jcjgraf jcjgraf commented Feb 25, 2022

Desperate attempt to get metadata to work nr 148 😋

We have had so many troubles to get metadata support and none of the available solutions seemed perfect. To quickly summarize to issues we have had:

  • No formatted output
  • Limited image type support
  • Not in official repository
  • Not even in aur
  • Many build deps
  • Generally broken
  • Deps break CI
  • Package no longer maintained
  • etc.

So I thought that is is probably the best idea to move the "dirty code" out to plugin space. More precisely, my idea was to remove all metadata related code which is fixed to a certain package, and instead add an interface which allows to load plugins which add metadata support. I would create onetwo "official"/internal plugin for metadata support based on piexif and pyexiv2, and then anyone is free to implement metadata plugins using whatever method one likes.

This PR is very WIP. Almost complete. Any feedback is very welcome

Update: I will maintain the TODOs in order to keep the overview...

TODOs

  • Main Implementation

    • Abstract class with interfaces for for get_metadata, get_keys, copy_metadata, and get_date_time
    • Metadata plugin registration functionality
  • Metadata Plugins

    • With piexif backend
      • Provide backward compatibility for having the long keys
    • With pyexiv2 backend
      • Do not provide backward compatibility for having short keys, as this is really ugly and confusing
  • Configuration

    • Default backed loading
      • For backward compatibility, a default backed should be loaded
      • Handled by plugin metadata
        • Should be possible overwrite default and have (only) a specific plugin loaded
          • The default backend should not be loaded at all
        • Should be possible overwrite default and have not metadata plugin loaded
    • Should keys be case-insensitive?
      • Currently they are not, and I do not see a reason why they should not be (but maybe there is?)
  • Handle case where there is no metadata support

    • MetadataWidget should not be loaded
      • Done using the plugins_loaded signal (thanks @karlch!)
  • Testing

    • Have been updated (thanks @karlch!)
    • More Tests (@karlch)?
    • Anything else (@karlch)?
    • There are still a few suboptimal things. See:

The integration tests unfortunately still require both marking and using the corresponding fixture, in case we test piexif / pyexiv2 explicitly as I couldn't figure out how to mark a fixture (instead of the tests) quickly. Could definitely be improved.

  • CI

    • pyexiv2 has been removed (in a seperate PR)
    • Anything else (@karlch)?
  • Docs

    • Detailed section about metadata
    • Change exif to metadata where applicable
    • How should user metadata plugins be listed?
    • Document default behaviour (what happens when no backend is explicitely loaded)
  • Misc

    • Change exif to metadata in code and comments where applicable

Deferred for another PR

  • Internal metadata keys (ref Internal metadata keys #325)

    • Will be implemented as a separate metadata plugin
    • No special functionality (interface) needed to make it work
  • Display of metadata plugin (backend) version in :version/--version

    • Interface (imutils.metadata.get_registrations) has been added
  • Deprecate default metadata support?

  • Interface for writing metadata

@jcjgraf jcjgraf mentioned this pull request Mar 31, 2022
karlch added a commit that referenced this pull request Apr 13, 2022
Python 3.6 has reached its end of life quite a while ago and is thus no
longer properly supported by many of our dependencies. CI is also
currently broken for python3.6 with the "newer" PyQt5 versions.

In the future we need to also add the newer python versions, i.e. 3.9
and 3.10, to our CI pipeline. However, this is not straightforward due
to py3exiv2 and will only be done after #467 is merged.
@karlch
Copy link
Owner

karlch commented Jan 3, 2023

While I won't be able to do a detailed code review quite yet, I do have some initial thoughts (that we probably have shared over mail at some point) and at least CI / dependencies have been fixed in master by now, so a quick rebase would be great 😊

I absolutely love the idea of getting the metadata havoc out of mainline, it has been quite an ordeal, you are right, and thanks again for all your hard work on this! However, I do believe we somehow need to support pyexiv2, given we asked users to switch "recently". Maybe you could just quickly recycle the existing code into another plugin with corresponding name, so we have both in the codebase for now? With this + the rebase, I should absolutely do a full review and we can discuss the details.

Later on I can take care of any CI / testing if you like, I feel like this will be messy 😂 but currently all should be fine, as we never test against pyexiv2 (on CI).

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 5, 2023

I have rebased for latest Master. Also, I have rewritten most code, as my initial implementation was a bit limiting, as it allowed only for one metadata handler to be present at the time.

The new implementation is a bit more complex in therms of code, but gives more freedom to users, in regards on how they want to handle metadata.
Using @exif.register(exif.Methods.XXX) users can register a function as a certain implementations of a metadata function (i.e for get_formatted_metadata, get_keys, get_date_time, copy_metadata). For all functions, with the exception of get_date_time, multiple implementations can be registered. The output of the gets combined.
This approach trivially solves #325, as I can create simply create another plugin, that registers some functions for get_keys and get_formatted_metadata.

exif.MetadataHandler is the class used to access metadata of an image (e.g. by the MetadataWidget). When one of its methods is called, exif._MetadataRegistry is accessed to load all available function implementations (@exif.register stores implementations in this registry).

As suggested, I have also created a plugin for metadata support using py3exif2 (which is, however, completely untested (and might even crash) as I am not able to get py3exif2 to run currently).

The latest commit is very WIP. I still need to do a lot of cleanup, fix docstrings, fix namings, and minor todos. Also, I have not looked into testing/CI yet (therefore the CI also fails).
However, it would be great if you could provide me with some (high-level) feedback. How you like the general idea and code pattern, or can you come up with something more slick?

@karlch
Copy link
Owner

karlch commented Jan 5, 2023

This is golden! I'll only leave a quick overall impression, not any detailed inline comments as of now then.

First of all, I absolutely love the flexibility this gives. So following are some random notes, in no particular order:

  • There are other places around the code (imutils) that use ExifHandler, these should also use MetadataHandler I guess.
  • All the if len(...) == 0 checks could be replaced by a simple if ..., but even then I guess we will need a wrapper for clients, i.e. self.has_formatted_metadata as property of the registry class or even as module-level functions, which would allow gui.metadatawidget call exif.has_formatted_metadata() in advance (assuming the importing is done in the correct order).
  • In general, I like how flexible the registry decorators are, but do we need them all? I guess at least the raw metadata part can be removed, as it is unused.
  • I guess the metadata retrieval of the current path should be cached in each plugin. Probably a simple functools.lru_cache(1) is enough.

Random thoughts over :D Once again, love it, and do think the overall structure makes a lot of sense. (Code) details to be checked later 😊

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 6, 2023

Thanks for the high-level feedback. I am glad you like the overall structure.

Some comments on your points:

  1. True, thanks!
  2. a) Often I prefer to be explicit and avert using truthy/falsy values as it can lead to nasty bugs. But I will change it.
    b) I like the idea of using module-level functions. I tried something similar, but the issue is that gui.metadatawidget is loaded before the plugins are loaded (startup.py imports gui.mainwindows, which imports the gui.metadatawidget, while plugins are loaded in setup_pre_app). Do you know of a good way to fix that?
  3. True, get_raw_metadata is not used yet. My idea was that this method should provide an interface for plugins that need access to (raw) metadata. Also, piexif does not provided formatting for metadata, so it should theoretically implement get_raw_metadata. However, this raises the question on how to decide what to display in the MetadataWidget. I have also though about renaming the methods to get_metadata_for_widget and get_metadata_internal (or something the like). What do you think?
  4. Oh, this sounds great. I was not aware of this decorator. Thanks for the hint!

Some additional thoughts:

  • I think the configuration of the sets still makes sense. Or do you see anything we should change about that?
  • Concerning testing, what should be tested?

Later on I can take care of any CI / testing if you like, [...]
I will have a look at CI/testing, but I cannot promise anything. Maybe it will be better if you sort that out 😬

I will work on it in the following days. Thanks again for your feedback. I wish you a nice weekend!

@karlch
Copy link
Owner

karlch commented Jan 6, 2023

  1. a) Understandable, I just feel like the implicit version is so much more readable in python.
    b) Unfortunately that is what I feared. One of the rare occasions I miss compiling vs dynamic importing 😂 Feel free to just write the has_... function(s) and leave the while True in gui.metadatawidget for now. I can play around with the import order in the same step as the CI / testing havoc, sounds like fun 😊 Difficult to say what needs to be done, without just doing it by changing the lines of code ..
  2. I see where you are coming from now. Do the various plugins provide the "raw" metadata in consistent format? If not, it would be really hard for another plugin to link in at that point, without assuming the backend. In which case it might as well just use the backend directly ... I also prefer the current naming, the new naming would mean that the exif plugins already know what they are used for, seems a bit backwards to me 😊

Concerning your additional thoughts:

  • If this refers to the whole [METADATA] part in the config section, I agree, it still makes a lot of sense and is very powerful as-is.
  • Ideally at least everything that is currently tested 😄 but you can definitely leave that part out for now if you like, and I can sort it out later, as I can't quite grasp how much havoc it will be just yet.

Thanks again for tackling this! And a great week-end to you too! 🥳

@karlch karlch mentioned this pull request Jan 6, 2023
10 tasks
@karlch karlch added this to the v0.9.0 milestone Jan 7, 2023
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 7, 2023

Thanks for you respond. Concerning 3.:

Do the various plugins provide the "raw" metadata in consistent format?

The way metadata are stored in images is (obviously) standardized (by Exif, IPTC and XMP), however, each package can present the data in whatever way they like. But most fields are basic types anyways. In the worst case a simple regex is probably sufficient to extract the data from the string value, provided by the backend.

[...] it might as well just use the backend directly

If somehow possible, I would like to prevent that any (future) plugin that requires metadata support, has to deal with the backend (piexif, exiv2, py3exiv2, etc.) themselves directly. But I vision that imutils.Metadata provides and interface/API for these plugins to get all metadata functionality they need. And this functionality should include:

  • reading formatted data (for direct display)
  • reading raw data (for further processing)
  • and the more I think about it, probably also writing data to fields (e.g. for a plugin that allows for setting star ratings).

I hope that clarifies my intend a little. Let me know if this makes sense to you and what you think about it.

@karlch
Copy link
Owner

karlch commented Jan 7, 2023

That definitely makes sense, I hope the work that goes into this doesn't get out of hand though 😂

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 8, 2023

Code-vise, this PR is pretty much done;

I decided to get rid of the get_raw_metadata method and I also did not implement any functionality for writing metadata. I think it is better to deal with that once there is actual demand for it. Since currently, I am not sure that the exact requirements are.

Concerning the MetadataWidget, I do not think that simply by reordering imports we are able to fix the issue. I guess some kind of callback function is needed to set the MetadataWidget as an overlay after the plugins are loaded. But I guess you will come up with a better solution than me, therefore, I will leave this up to you 😊

I am also not sure what information concerning metadata support to return in version.info.

All tests are currently broken. I will look into them and try to fix them. Did I get it correctly, we get rid of all py3exiv2 related tests? So we only test the no-metadata case and when metadata_piexif is loaded?
But as said, probably you need to help me out with that 😬 I do also not know that needs to be changed with the CI. Currently I get some linter and mypy errors that pyexiv2 cannot be imported.

I will update the docs and the changelog maybe later today or in the next few days.
Do you already know when you have time for a detailed review? You could also already review the code now if you like.

EDIT
Rebased for master

@karlch
Copy link
Owner

karlch commented Jan 8, 2023

Thanks again! 😊

This makes perfect sense, the whole PR is complex enough as-is.

You are probably right, in some way the metadata widget has to be imported after the plugins, which may mean it is added to the overlays after mw is already created.

That is a good question 🤔 Could we add a "name" and "version number" to the metadata plugins somehow, and this is then displayed?

Concerning the tests: yes, currently pyexiv2 is not tested, at least not on the CI. I can still run it locally though, since for some reason it still runs fine on my laptop 😄 With the imports, it probably still makes sense to lazy-import piexif / pyexiv2? Not sure about the errors though, will look into that later.

I'll try to keep the current momentum up, hopefully at latest next week-end in the train 😊

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 8, 2023

Could we add a "name" and "version number" to the metadata plugins somehow

Do you mean name/version of the plugin or of the backend? I have thought about adding the name of the plugin too, but I was not able to come up with a clean solution that enforces clients to set this value. Using some inspect.stack() hackery, one can get the plugin name (by extracting it from the path of the executed file). Though, that is probably not very reliable.
I have thought a lot about having a Handler class that plugins can subtype (instead of having the register decorators). Maybe that is the way to go? An abstract class is not really applicable, as clients are not required to implement all methods.

With the imports, it probably still makes sense to lazy-import piexif / pyexiv2?

You mean in the two metadata plugins? Sure, I can do that.

Sure, no pressure at all! I find it great that you are able to invest again some time for vimiv! 😃

@karlch
Copy link
Owner

karlch commented Jan 8, 2023

I also had some thoughts on the class that clients can subtype. We would then somehow have to figure out what they implemented, some funky inspection I guess, and add to the implementation registry as is currently the case upon construction of the parent class. I.e. something like (does not work, don't take the names too seriously, just brainstorming):

class MetadataPiexif(MetadataBase):
     def __init__(self):
          super().__init__("piexif", piexif.version)
          
class MetadataBase:
    def __init__(self, name: str, version: str):
        self.name = name
        self.version = version
        # Check what the child implements and update registry
        
    def get_formatted_exif(self):
        raise NotImplementedError("Must be implemented by the child class")

Could work 🤔

Thanks! Only if this avoids the actual import though 😆 i.e. there is no actual call to piexif.* upon startup. This would probably break at latest if we call piexif.version or so 😢

@jcjgraf jcjgraf mentioned this pull request Jan 8, 2023
4 tasks
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 8, 2023

Yeah, something like this should work. But as you say yourself, we would need to detect what methods get implemented (/overwritten).
Alternatively, we could redesign the registry, to not store list of function for each operation, but the subtypes of the plugin directly. For each of its methods, the MetadataHandler would then iterate over all subtypes, and executed their respective method. If the client did not implement that operation, the MetadataHandler gets a NotImplementedError which it can ignore.

EDIT
Having a baseclass to subtype has also the advantage that the signatures of the methods they overwrite is unambiguous. This is not given with the current approach.

Alternatively, something like this could work too:

  • Add to imutils.metadata:
class Registrar:
    def __init__(self, name: str, version: str) -> None:
        self._name = name
        self._version = version

    def register(self, operation: Operations) -> Callable[[customtypes.FuncT], customtypes.FuncT]:
        def decorator(func: customtypes.FuncT) -> customtypes.FuncT:
            # To something with self._name/self._version here
            _registry.register(operation, func)
            return func

        return decorator
  • Interface for plugins:
registrar = metadata.Registrar(name="piexif", version=piexif.VERSION)

@registrar.register(metadata.Operations.copy_metadata)
def copy_metadata(path: Path, dest: str, reset_orientation: bool = True) -> bool:
    ...

@karlch
Copy link
Owner

karlch commented Jan 8, 2023

Makes sense, looking at these two options, the baseclass - subtyping approach seems the most clean. Especially with your option of ignoring the NotImplementedError 😊

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 8, 2023

Doing this, however raises the question on how to implement has_get_metadata(). It seems like we need to inspect the subclass anyways to see what methods they implement 🤔

@karlch
Copy link
Owner

karlch commented Jan 8, 2023

Yep, you are of course right, this will need inspecting anyway ...

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jan 8, 2023

Could you by any chance take a look at the latest commit? I have refactored the registration process such that clients are required to subclass MetadataPlugin, as discussed.
The name and version are not specified in the init function, as you suggested, but as class attributes. This has the advantage that I can access these values in metadata.register, before creating any instance of the subtype class. Using MetadataPlugin.__init_subclass__ I enforce that clients set these two values.

The commit is very WIP. I have only adopted the piexif plugin so far.

From a high-level, this this look good to you?

vimiv/imutils/metadata.py Outdated Show resolved Hide resolved
The dict returned by `get_metadata` was keyed by the *truncated*
metadata key, and not the original one. As the metadata widget uses the
original keys to read out the data, nothing was displayed if long keys
are used in the config.
I did not replace the link with one to `metadata.rst` as this lead to
a weird error. I also do not think that this link is super relevant.
@jcjgraf jcjgraf force-pushed the core/metadataplugin branch 2 times, most recently from 92975bc to f62ede3 Compare June 24, 2023 13:50
If no metadata plugin has been specified, the default `metadata` plugins
loads one of `metadata_pyexiv` or `metadata_piexif` (depending on the
installed backend). If a specific backend is configured, they it is
ensured that `metadata` does not interfere.

This is achieved by deferring the loading of the `metadata` plugin to
the end of all plugins. At that point `metadata` can check if another
metadata plugin has been loaded, and if so, do nothing.
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jun 24, 2023

I have rebased this branch for master, and updated and restructured the TODOs in the PRs description. The restructuring should make it better visible what has been done in this PR and what is left to do. This should (hopefully) facility the review of this PR [1] 😇

In addition, I have added a proposition (fb66610) to the last outstanding issue:
For backward compatibility, we want that a metadata backend is automatically loaded. This was done by the metadata plugin. However, this approach had the issue, that either metadata_piexif or metadata_pyexiv2 was always loaded, even if one explicitly loaded a different metadata plugin. I have changed that such that a default plugin is only loaded when no other metadata plugin has been loaded. Using this approach, it is still not possible to use vimiv without having any metadata plugin loaded (in case piexif or exiv2 is installed). Though, it is not that big of an issue I think. Especially should be decide to deprecate having metadata support by default.

[1] Since you have been a bit more active last week, I had the hope that you may have a look at (and review 🫣) this PR when I get it up to date again. But no pressure - take the time you need 😬

@karlch
Copy link
Owner

karlch commented Jul 5, 2023

So, I was finally able to give this another try. Excellent work, I believe almost all issues are solved and this is nearly ready to be merged.

I tried to tackle some of the last bits in the metadataplugin branch here directly for you to check (feedback / comments welcome as always) and cherry-pick / adapt accordingly:

  • Plugin loading: I slightly refactored this so metadata can deal with this directly and must not be handled specially. We can also disable autoloading by passing metadata = none in the configfile.
  • Tests: marks are now registered automatically once piexif or pyexiv2 fixtures are included. Could probably be extended beyond this single test file at some point in the future.
  • CI: I would like to get at least one runner with pyexiv2 back in. The simple adding died, so please ignore this for now, and I will continue debugging after merging.

Final remaining part would be the documentation points you mentioned.

User plugins are now loaded before app plugins, and not overridden. This
allows metadata (and any other app plugins) to be loaded later and
access any user-specific information. In this case, the user can
deactivate auto-loading by passing

metadata = none

in the plugins section of the config.
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jul 5, 2023

Thank you very much for taking time to review this PR!

Your changes in the metadataplugin branch look good to me and I have cherry-picked them over. The CI stuff I do not get, so here I gotta fully trust you there😁 So we keep pyexiv2 in the CI for the future?

I have updated the docs. Let me know if there is anything to improve. Otherwise, I am done with this PR 😃

@karlch
Copy link
Owner

karlch commented Jul 5, 2023

Thanks a bunch for the documentation, I will either just merge or push changes here directly once I get to it, no more need for another iteration I hope 🎉

CI: would you mind removing that one commit again? This currently doesn't work unfortunately, thus the "please ignore this for now" 😉

@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jul 5, 2023

Oh, I though I should ignore that it is broken 🙈 I have remove it

Yeah, if there are not major things, feel free to correct everything yourself directly.

@karlch karlch merged commit 1aa8f19 into karlch:master Jul 5, 2023
@karlch
Copy link
Owner

karlch commented Jul 5, 2023

Merged - so ... party? 😄 🎉

@jcjgraf jcjgraf deleted the core/metadataplugin branch July 5, 2023 17:31
@jcjgraf
Copy link
Contributor Author

jcjgraf commented Jul 5, 2023

Well, that is simply amazing🎉

Metadata support has been a burden for such a long time, and how we have finally sorted it out for once and for all [1]

[1] Better do not quote me on this 🫣 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants